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Abstract 

Background: Personalized drug prescription can be benefited from the use of intelligent information management 
and sharing. International standard classifications and terminologies have been developed in order to provide unique 
and unambiguous information representation. Such standards can be used as the basis of automated decision 
support systems for providing drug-drug and drug-disease interaction discovery. Additionally, Semantic Web 
technologies have been proposed in earlier works, in order to support such systems. 

Results: The paper presents Panacea, a semantic framework capable of offering drug-drug and drug-diseases 
interaction discovery. For enabling this kind of service, medical information and terminology had to be translated to 
ontological terms and be appropriately coupled with medical knowledge of the field. International standard 
classifications and terminologies, provide the backbone of the common representation of medical data while the 
medical knowledge of drug interactions is represented by a rule base which makes use of the aforementioned 
standards. Representation is based on a lightweight ontology. A layered reasoning approach is implemented where at 
the first layer ontological inference is used in order to discover underlying knowledge, while at the second layer a 
two-step rule selection strategy is followed resulting in a computationally efficient reasoning approach. Details of the 
system architecture are presented while also giving an outline of the difficulties that had to be overcome. 

Conclusions: Panacea is evaluated both in terms of quality of recommendations against real clinical data and 
performance. The quality recommendation gave useful insights regarding requirements for real world deployment 
and revealed several parameters that affected the recommendation results. Performance-wise, Panacea is compared 
to a previous published work by the authors, a service for drug recommendations named GalenOWL, and presents 
their differences in modeling and approach to the problem, while also pinpointing the advantages of Panacea. 
Overall, the paper presents a framework for providing an efficient drug recommendations service where Semantic 
Web technologies are coupled with traditional business rule engines. 

Keywords: Ontologies, Decision support, Rule-based reasoning, Drug recommendations 



Background 

One of the health sectors where intelligent information 
management and information sharing compose valuable 
preconditions for the delivery of top quality services is 
personalized drug prescription. This is more evident in 
cases where more than one drug is required to be pre- 
scribed, a situation which is not uncommon, as drug inter- 
actions may appear. The problem is magnified by the wide 
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range of available drug substances in combination with the 
various excipients in which the former are present. 

If one takes into account that there exist more than 
18,000 pharmaceutical substances, including their excip- 
ients, then it is clear that the continuous update of 
health care professionals is remarkably hard. Over this, 
the extensive literature makes discovery of relevant infor- 
mation a time consuming and difficult process, while the 
different terminologies that appear between sources add 
more burden on the efforts of medical professionals to 
study available information. 

Semantic Web technologies can play an important role 
in the structural organization of the available medical 
information in a manner which will enable efficient 
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discovery and access. Research projects funded for 
enabling Semantic Web technologies in the diagnosis and 
therapeutic procedures exist such as REMINE [1], PSIP 
[2], NeOn [3] and Active Semantic Documents [4] or 
works such as [5], but they don't fully address the prob- 
lem of automated drug prescription using drug-drug and 
drug-disease interactions. 

Rule-based approaches have been proposed for address- 
ing issues relating to biomedical ontologies research. It 
is common for ontologies written in expressive Seman- 
tic Web languages such as OWL a , to not be able to 
handle all requirements for capturing the knowledge in 
several biomedical and medicine domains. As a method 
for enriching the expressiveness of ontology languages, 
researchers have proposed the use of rules which act upon 
the defined ontological knowledge. According to [6], rules 
are helpful in the following situations relating to biomed- 
ical ontologies: defining "standard rules" for chaining 
ontology properties, "bridging rules" for reasoning across 
different domains, "mapping rules" for defining map- 
pings between ontologies entities and "querying rules" for 
expressing complex queries upon ontologies. The author 
gives a thorough review of RuleML b and SWRL C , the two 
major ontology rule languages, the available rule forma- 
tion tools and the reasoners. Golbreich et al. [7] makes 
use of the outcomes of the previous paper to showcase 
the need for rules in biomedical applications with a use 
case of a brain anatomy definition, where a brain struc- 
ture ontology is defined in OWL but rules describing the 
relationships between the properties and entities that are 
needed for correct annotation of MRI images. Another 
work citing the need for semantically enriched rules, 
where an ontology is coupled with SWRL rules for anno- 
tating pseudogenes and answering research questions, has 
been proposed in [8]. All the above papers present the 
need for extending ontologies with rules in order capture 
the knowledge of complex biomedical domains. 

The paper presents Panacea, a semantic-enabled sys- 
tem for discovering drug recommendations and inter- 
actions. Panacea is based on experiences and lessons 
drawn from the development of GalenOWL [9], a sim- 
ilar system which had Semantic Web technologies in 
its core. As such, Panacea can be considered the evo- 
lution of GalenOWL in terms of design and scalability. 
Panacea makes use of established and standardized med- 
ical terminologies together with a rich knowledge base 
of drug-drug and drug-diseases interactions expressed as 
rules. Panacea is implemented having in mind scalabil- 
ity, completeness of results and responsiveness in query 
answering. 

Standard terminologies and semantic web 

Standard terminologies and classifications in the med- 
ical domain have been developed in order to support 



information sharing and exchange and to enable a com- 
mon expression of key concepts. Such is the case for 
example for the ICD-10 d (International Classification of 
Diseases) index of the World Health Organization (WHO) 
where "it is used to classify diseases and other health prob- 
lems recorded on many types of health and vital records" 
across many countries. The classification is also used for 
storing and retrieval of diagnostic information and for 
the compilation of national statistics reports by the WHO 
members. 

On the other hand, ontologies and the Semantic Web 
enable a common representation and understanding of 
knowledge. Ontologies can effectively capture a domains 
knowledge by "specifying the definitions of terms by 
describing their relationships with other terms". A rea- 
soner can be employed upon an ontology in order to 
uncover implicitly defined information while the expres- 
siveness of ontologies can be further enriched by formu- 
lating rules in standard rule languages, such as RuleML 
or SWRL that are mentioned above, thus not sacrificing 
interoperability. 

Panacea aims to combine and make use of the benefits 
of standard terminologies and Semantic Web technolo- 
gies by enabling inference and rule-based reasoning on 
ontologies that have been expressed using the medical 
standards. 

Methods 

Architecture and functional design 

The purpose of Panacea is to provide drug prescription 
recommendations based on a patient s medical record, i.e. 
advise physicians to prescribe medications according to 
the drugs active substance indications and contraindica- 
tions. For details regarding the initiative that triggered 
development of Panacea and the initial medical and phar- 
maceutical data that were available, the reader is encour- 
aged to read [9]. 

Panacea follows a layered reasoning process which is 
depicted in Figure 1. During the start-up of the system, 
the medical terminologies, namely ATC e (Anatomical 
Therapeutic Chemical), UNII f (Unique Ingredient Identi- 
fier), ICD-10, ICTV g (International Virus Taxonomy) and 
custom encodings, are transformed to semantic entities, 
using an appropriate vocabulary, and the initial ontol- 
ogy is constructed. The ontology binds to a reasoner to 
infer relations such as inheritance and unions. This pro- 
cess is performed once offline during initialization and the 
knowledge base is available to the system for further uti- 
lization. In order to get recommendations in Panacea, a 
patient instance with the appropriate medical record data 
is created and fed to the knowledge base. The reasoning 
process enriches the patient instance with inferred knowl- 
edge, thus making it explicit. On this enriched instance, 
and by utilizing a different reasoning process, the set of 
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Figure 1 Panacea framework architecture and data flow. 



medical rules is applied upon. The result of this final stage 
of rule-based reasoning is the recommendations list which 
can be retrieved through SPARQL h querying. 

A key characteristic of the suggested architecture is 
that, regarding second level reasoning, the framework can 
utilize any rule-based reasoner or rule engine. Since all 
the inferred knowledge of the medical definitions and 
patient data is materialized in the knowledge base, the 
medical rules can be expressed and loaded in an appro- 
priate rule engine. The rule engine could be an ontology 
reasoner or a business rule manager with appropriate cus- 
tomizations in the data structures. This approach helps in 
bringing together the best of both worlds: semantic and 
meaningful representation of data using Semantic Web 
technologies and the maturity of traditional rule engines 
in efficiently handling complex and large amounts of 
rules. 

Use case scenario 

In order to demonstrate the benefits of the proposed 
semantic recommendation system, a use case regarding 
a possible scenario is described below. The codings in 
the parentheses represent the corresponding ICD-10 and 
ATC codes of diseases and drugs, respectively. 

An elder man visits his family doctor complaining 
for pain in his right lower back and abdominal region 
which is accompanied with fever. After appropriate clin- 
ical examination, he is diagnosed with right pyelonephri- 
tis (ICD-10: N11.0). According to the patients medical 



history, he is suffering from chronic atrial fibrillation 
(ICD-10: 148.2) for which he receives clopidogrel (ATC: 
B01AC04), vertigo (ICD-10: H81.49) for which he receives 
cinnarizine (ATC: N07CA02), high arterial blood pressure 
(ICD-10: 110) for which he receives candesartan (ATC: 
C09CA06) and amlodipine (ATC: C08CA01), and dia- 
betes mellitus (ICD-10: El 1.9) for which he receives met- 
formin (ATC: A10BA02) and sitagliptin (ATC: A10BH01). 
For the new condition of pyelonephritis that was diag- 
nosed, the treating doctor must decide a number of things. 
Regarding the prescription for treating this new disease, 
the doctor has to decide which active substances to pre- 
scribe in order to treat the resulting inflammation, the 
cause of the inflammation, the back and abdominal pain 
and the resulting fever. 

However, before a decision is made the following fac- 
tors regarding the patients medical history should also be 
considered: 

• There should be a check for drug-drug interaction 
that the patient is taking, before the onset of the new 
condition (the pyelonephritis). 

• There should be a check for drug-disease interaction 
of the drugs that the patient is already prescribed 
with the new condition. 

• The new prescription has to be verified that it will 
not have adverse effects or interactions with the 
previously prescribed medication and with the 
patient's medical history. 
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It is clear that the task for the doctor can be hard and a 
misjudgment could lead to wrong prescriptions. Using an 
automated drug recommendation system can minimize 
this risk. The recommendation system will use the input 
data and the pharmaceutical rules in order to propose a 
treatment that will be safe for the patient. 

Semantic transformations 

Panacea is built on top of international standards of 
medical terminology in order to represent medical and 
pharmaceutical information. The following standard ter- 
minologies are used: 

ICD-10: International Classification of Diseases. It is 
used in Panacea for unique identification of diseases 
thus uniquely identifying drug indications and 
contraindications related to diseases. The latest 2010 
version was used in this work. 
UNII: Unique Ingredient Identifier. Used for the 
identification of active ingredients found in drugs. In 
Panacea it is used for uniquely identifying drug 
indications and contraindications related to 
ingredients. The 2013 index was used. 
ATC: The Anatomical Therapeutic Chemical 
Classification is used for the classification of drugs. In 
Panacea it is used in similar fashion to UNII. The 
latest 2013 index was used. 

ICTV: The International Committee on Taxonomy 
of Viruses indexing is used for the classification of 
viruses. In Panacea it is used in order to uniquely 
drug indications and contraindications related to 
viruses. The latest 2012 release was used. 

Besides these international standards, a number of 
domain classifications have been declared and used in 
order to enhance the usability of the system or to repre- 
sent data that are not included in the standards. These 
classifications act as supplementary to the standards. 

Substance: As the use of encodings for drug ingre- 
dients is not convenient for humans, the identification 
of active substances is done using its common name 
references in medical bibliography. These names come 
from international standards such as the INN (Interna- 
tional Nonproprietary Names) and others such as USAN 
(United States Adopted Name) or BAN (British Approved 
Name). Members of this identification list are substances 
such as acetazolamide or isradipine. In addition, sub- 
stances correspond to ATC codes such that for example 
acetazolamide = S01EC01. The substances are the actual 
recommendations of Panacea. 

Custom Concepts: While the ATC, ICD-10, UNII and 
ICTV standards are complete, they are designed for use in 
contexts different from Panacea and drug recommenda- 
tions, e.g. for annotation, search or information retrieval. 



As such, it is often desirable to enrich the knowledge base 
with information that, while not standard, will aid in the 
usability and overall efficiency of the system. Especially for 
medical/pharmaceutical rules formulation, it was found 
out that there were occasions that the definition of dis- 
eases, drugs or other was either absent, incomplete or 
too general to be useful for a rule definition. An exam- 
ple for the lack of a definition in ICD-10 is the absence of 
a precise and specific code for "Chronic obstructive pul- 
monary disease" or for "Hypertrophy (benign) of prostate" 
which is under the general code N40 - Hyperplasia of 
prostate among other hyperplasia conditions. For this rea- 
son, a number of custom concepts have been defined. 
Examples of such concepts is disease definition such 
as "Narcolepsy", microorganisms such as "Clostridium 
clostridiiformis" or medical acts such as "upper extremity 
arteriography". 

Custom Concept Collections: Certain "groups" of sub- 
stances and/or diseases are frequently present in drug 
interactions and these groups are not recorded explicitly 
in any standardized classification, so its more convenient 
for medical use to specify these custom groups. These 
often used groups are termed "conditions" in Panacea and 
are defined by medical experts. A condition can appear as 
a premise in other condition definitions, as in the Custom 
Concept Collection cardiac-rhythm- abnormalities: 

cardiac-rhythm' abnormalities = ccbradycardia | 
icd:R00 | cctachycardia | icd:O68.0 | icd:068.2 where 
ccbradycardia is defined as "icd:I49.5 | icd:R00.1 | 
icd:O68.0" and cctachycardia as "icd:R00.0 | icd:I49.5 | 
icd:I47 | icd:O68.0". "icd:" stands for the ICD-10 names- 
pace. 672 Custom Concept Collections have been defined 
and are used in this work. 

SKOS vocabulary 

In the approach followed in [9], the medical standards and 
the custom definitions were translated to OWL classes, 
primitive and defined. While this approach had the benefit 
of using the languages semantics to model the avail- 
able information, there were problems resulting from this 
design decision. One of the major issues was the difficulty 
in scaling the system. Until currently, very few reasoners 
are available that can efficiently handle the amount of class 
definitions and reasoning required to run the system, both 
in terms of memory consumption and speed. 

In Panacea, a different approach was adopted. The 
SKOS 1 (Simple Knowledge Organization System) vocab- 
ulary is a W3C (World Wide Web Consortium) recom- 
mendation, its built using RDFS (Resource Description 
Framework Schema) semantics and has been devel- 
oped as a low-cost migration path for porting existing 
knowledge organization systems, such as thesauri, tax- 
onomies, classification schemes and subject heading sys- 
tems, to the Semantic Web. It enables a "lightweight" 



Doulaverakis etal. Journal of Biomedical Semantics 2014, 5:1 3 
http://www.jbiomedsem.eom/content/5/1/13 



Page 5 of 10 



semantic representation of such knowledge systems and 
is a good match for the medical standards that are 
used in Panacea. As such, all the terminologies which 
are mentioned in the previous section have been trans- 
formed using the SKOS vocabulary automatically using a 
parser. 

Comparing SKOS to the approach followed in [9], 
instead of representing the ATC, ICD-10 and UNII classi- 
fications as top-level classes, they are now represented as 
instances of the skos:ConceptScheme class, "skos" stands 
for the SKOS namespace. Each entry in these classifica- 
tions is represented as an instance of the skos:Concept 
class. The OWL class hierarchy of [9] is represented 
in Panacea using the properties skos:broaderTransitive 
and skosmarrowerTransitive, while the unions of classes 
for Custom Concepts Collections are represented using 
the skos:member property. Correspondence between the 
semantic transformation methodologies that were fol- 
lowed in the current work and in [9] is presented in 
Table 1. 

It is interesting to note that the SKOS vocabulary offers 
exactly what is needed in order to capture the semantics 
of the medical classifications without making sacrifices in 
expressiveness. One can argue that it can be considered 
more precise than the OWL expressions, as in the case 
of the similarity of Substances and ATC codes. This simi- 
larity is better represented by the skosxloseMatch relation 
than owhequivalentClass. For Panacea a total of 64,658 
definitions of classification codes have been expressed 
using SKOS. 

Panacea ontology and reasoning 

The core ontology of Panacea is visualized in Figure 2. 
The aforementioned SKOS ontologies were imported to 
the Panacea core ontology under the MedicalDefinitions 
class. The Patient class holds the patient instances and 
is connected to the MedicalDefinitions class with the 
hasData properties. The patient recommendations, indi- 
cations and contraindications, regarding substances that 
should and should not be prescribed are expressed with 
the canTake and cannotTake properties, respectively. The 
patients age group and sex group are expressed through 
the hasAgeGroup and hasSexGroup properties. 



Table 1 Correspondence between the semantic 
transformation in the early GalenOWL system and the 
proposed Panacea framework 



Annotation 
Equivalence 
Custom collections 
Hierarchy 



GalenOWL 

rdfs:label 
owhequivalentClass 
owhunionOf 
rdfs:subClassOf 



Panacea 

skos:prefl_abel 
skos:closeMatch 
skos:member 
skos:broaderTransitive 




Figure 2 Panacea ontology. 



Medical reasoning 

When querying the system for recommendations, a 
patient instance is created with the initial patient data 
(through the hasData, hasAgeGroup and hasSexGroup 
properties) and is loaded in the knowledge base. The 
reasoner, using RDFS inference and a small number 
of additional rules, infers all the implicit patient data. 
As an example consider a patient who suffers from 
a form of thrombocytopenia. An instance is created 
with the property <pnc:patient pnchasData icd:D69.6>. 
The reasoner through the skos:broaderTransitive relation 
will infer the triples <pnc:patient pnchasData icd:D69>, 
<pnc:patient pnchasData icd:D65-D69>, <pnc:patient 
pnchasData icd:DS0-D89 >. Additionally, the custom col- 
lection definition of pne-cc deficiency '-bone-marrow has 
icd:D69.6 as one of its members so the triplet <pnc:patient 
pnchasData pne- ccdeficiency -bone-marrow > will also be 
inferred. At the end, the patient instance will be enriched 
with all the underlying implicit information. 

Rule-based reasoning 

Drug recommendations in Panacea are generated using 
a rule-based approach. The rules express the indica- 
tions and contraindications of drug substances while their 
premises are the medical definitions and the patients' age 
and sex group. The rules use the logical operators and (&) 
and or (|) and parentheses. An example of a rule is for the 
substance " lisuride" which is expressed as 

lisuride = icd:E22.0 | (icd:E22.1 & (icd:N91.0 | icd:N97)), 
&geGr:oup=adult or elder 

The above rule reads that the substance "lisuride" is 
recommended for adult and elder patients who suffer 
from E22.0, OR suffer from E22.1 AND one of the N91.0 
OR N97. For using these rules, they have to be properly 
parsed and transformed in order to match the knowledge 
base and the enriched, with implicit knowledge, patient 
instance. The proposed rule structure allows modifica- 
tions to specific rules without the changes affecting the 
rest of the rule base. This enables the rule base to be 
up-to-date with the latest clinical advancements, which 
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is a requirement as clinical pharmacology and medicine 
are constantly evolving. Analyzing Panaceas architec- 
ture in Figure 1 it can be seen that due to the layered 
reasoning approach, the knowledge base (medical defi- 
nitions + reasoner) is actually used for producing the 
enriched patient instance. This means that the instance 
can be fed to a rule reasoner which has appropriately 
loaded the medical-pharmaceutical rules, without the rea- 
soner having to communicate with the knowledge base for 
further utilization. Using this approach and with proper 
modifications, any rule engine can be used to produce 
the drug recommendations. To demonstrate this ability, 
two separate rule engine integrations have been developed 
and are presented below. The medical rule base consists 
of 1, 342 rules which were extracted and encoded directly 
from official documents, such as Summary of Product 
Characteristics (SPC) and Patient Information Leaflets 
(PIL), regarding drug indications, contraindications, inter- 
actions and dosage. The validity of the rule base has 
already been assessed in [9]. 

It should be noted that work is under way in order to add 
more functionalities in the drug proposed recommenda- 
tion system. One of these is the ability to offer additional 
information such as the proposed dosage for a recom- 
mended substance. In order to accomplish such a task, the 
pharmaceutical rules are being enriched with clinical vari- 
ables, other than sex and age group, that are important. 
These variables include somatometric characteristics such 
as height and body weight, creatinin clearance (useful for 
calculating the dosage for antineoplasmic drugs) and the 
disease itself as a substance could be indicated at a specific 
dosage to treat a certain disease, but a different dosage is 
recommended for another disease. 

Jena rule engine 

For using the rule engine of the Apache Jena APP the rules 
had to be translated to the Jena rule language. An auto- 
mated parser was developed for this purpose. As for most 
semantic rule reasoners, OR clauses are not allowed in a 
rule definition so separate rules had to be expressed for 
every premise that was ORed in the original rule base. 
For example, the rule for "lisuride" was expressed by 3 
different rules: 

1. (?patient pncihasData icd:E22.0) -> 

(?patient pne : canTake sub : lisuride) 

2. (?patient pncihasData icd:E22.1) 
(Ppatient pncihasData icd:N91.0) —> 

(?patient pne : canTake sub : lisuride) 

3. (?patient pncihasData icd:E22.1) 
(?patient pncihasData icd:N97) — >► 

(?patient pne : canTake sub : lisuride) 

This rule expansion resulted in a total of 6,451 rules 
to be expressed in the Jena language. Trying to load the 



whole rule base and performing inference for recommen- 
dations proved inefficient for real time use, requiring on 
average as much as 8 seconds. In order to tackle this issue 
a coarse rule selection phase was introduced. The selec- 
tion was executed in 2 iterations. During the first iteration, 
a subset A of candidate rules is created from the initial 
rule base, that match the patient's sex and age group. This 
subset is selected for further processing. In the second 
iteration, rules from A that contain at least one of the 
patient's data, i.e. a skos term, in their premises are sin- 
gled out and a final set 1Z c A is created from them. 
Remembering that the implicit knowledge extraction was 
performed during the introduction of the patient instance 
to the reasoning framework, creation of 1Z is actually a 
simple and fast process. It merely requires string match- 
ing and all the whole processing is executed in memory. 
As a result the overall burden that is added to the whole 
reasoning process is minimal. From the initial rule base 
of 6,451 rules it is common for 7Z to contain as less as 
50 rules, whose evaluation is much more efficient. Rule 
execution is performed with the Jena rule engine and the 
patient instance is modified and now contains the drug 
recommendations. These recommendations are retrieved 
through SPARQL querying, using Jena's query engine. The 
advantage of the Jena engine is that it can readily consume 
the patient instance for producing the recommendations. 

Drools rule engine 

As an alternative approach, the Drools k business rule 
engine was used. In contrary to Jena, Drools could not 
directly use the patient instance for performing reason- 
ing. For this purpose, the instance was transformed to a 
Java bean, where the properties of the ontology Patient 
class are mapped to Java methods using the JenaBean 
API 1 . A similar approach for integrating Jena and Drools 
was used in [10]. The Drools rule language permits the 
use of OR ed clauses in the body, so the 1, 342 original 
medical rules were translated to the same amount of 
rules in Drools, using an automated parser similar to the 
one used in the Jena approach. For example, the rule for 
"lisuride" from the previous paragraph was expressed 
as: 

RULE xx lisuride' ' 
WHEN 

p: Patient (data : hasData) 
exists ( 

(MedicalDef (uri==icd: E22 . 0) from 

data) | | 
(MedicalDef (uri==icd : E22 . 1 && 

uri==icd:N91 . 0) from data) || 
(MedicalDef (uri==icd : E22 . 1 && 

uri==icd : N97 ) from data) 

) 
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THEN 

Substance lisuride = (Substance) JenaBean . 

reader () . load ( sub : lisuride) ; 
modify(p) {p . canTake ( lisuride) } 
END 

Execution was straightforward with no preprocessing 
required. Drools is optimized for handling large rule 
bases, so no rule pre-selection step was required as this 
would have little impact in reasoning efficiency. The result 
of this reasoning process is a modified patient Java bean 
with the drug recommendations. The Java bean is trans- 
formed to Jena model instance and SPARQL querying 
for retrieving the recommendations is possible. What this 
approach demonstrates is that its possible to integrate 
business rule engines as reasoners in the framework, thus 
being able to make use of the high efficiency and optimiza- 
tions of these engines with the semantic description and 
interpretation of data. 

Evaluation and discussion 

The evaluation of Panacea was performed using two dif- 
ferent approaches. One approach assesses the quality of 
drug recommendations while the other assesses the effi- 
ciency of the developed system in terms of computational 
requirements and performance which are of importance 
when a system is launched in a production environment. 
Both approaches are detailed in the next Sections. 

Quality evaluation 

The first approach involved the evaluation regarding real 
clinical data from treated patients in a hospital environ- 
ment. 21 anonymized patient medical record files (cases) 
from the AHEPA m University Hospital of Thessaloniki 
were gathered and an analysis was performed on then. 
Data regarding the patients medical history (medication 
that the patients are taking and active diseases that they 
suffer from), diagnosis related to condition that lead the 
patients visiting the hospital and the medication that was 
actually prescribed, were gathered. These data formed the 
basis against which the prescription recommendations, 
that Panacea generates, were compared to. In addition, 
all patient data (existing diseases, current medication, 
newly diagnosed disease(s), new medication) were used by 
Panacea in order to discover possible interaction and/or 



contra-indications that were either missed or disregarded 
by the treating physicians. As such, the comparison iden- 
tified the following: 

• Average number of identified drug-drug interactions 
per case 

• Average number of identified drug-drug 
contraindications per case 

• Average number of identified drug-disease 
interactions per case 

• Average number of identified drug-disease 
contraindications per case 

• Percentage of agreement between the automatically 
generated drug recommendations vs actual 
prescription for all cases, i.e. how many of the 
recommendations Panacea generated agreed with the 
prescribed drugs, for the newly diagnosed disease(s). 

Although it doesn't affect the final results, it should be 
noted that the evaluation took place using Drools as a 
rule engine. The identified interactions and contraindica- 
tions per case are displayed in Table 2. For most of the 
cases where contraindicated drugs were prescribed, it was 
either done deliberately, e.g. a patient suffering from brain 
ischemia was administered tinzaparin and acetylsalicylic 
acid for their (normally contraindicated) enhanced blood 
thinning effect when combined, or because the con- 
traindications were deemed unimportant related to the 
patients critical condition. However there were a few 
cases where the administration of contraindicated drugs 
couldn't be justified. For these cases we can assume that 
the physicians didn't have knowledge or made an error 
during prescription. The use of an automated recom- 
mendation system, such as the one presented, could have 
prevented such errors. 

At the next stage of the evaluation, the patients' med- 
ical history (active diseases and current medication) and 
the current diagnosis was taken into account in order 
to generate automatic recommendations from Panacea. 
These recommendations were compared to the prescrip- 
tion that the patients received from the hospital. Using 
this comparison, Panacea managed to match the 67.1% of 
the prescribed drugs (Table 3). While a level close to 100% 
would have been expected in a fully controlled environ- 
ment where all prescriptions are given according to formal 



Table 2 Discovery of drug interactions and contraindications against real clinical data 





Drug-drug 
interactions/case 


Drug-drug 
contraind./case 




Drug-disease 
interactions/case 


Drug-disease 
contraind./case 




3.28 


0.43 




4.48 


0.24 


Actual prescribed 
drugs/case 






7.38 







°AII numbers are average. 
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Table 3 Matching recommendations 



Total administered drugs 


76 


Matching recommendations 


51 


Av. recommendations/case 


62 


Percentage (matching/administered) 


67.1% 



medical guidelines, this was not achieved due to several 
parameters that their role was revealed during the eval- 
uation. On average, Panacea recommended 62 different 
drugs per case. 

Several reasons that affected the recommendation 
results have been identified. One of them is that in its 
current form, Panacea rules expressing drug interactions 
are binary, thus an interaction either exists or not with- 
out having the means to express its level of effect (how 
important it is considered). The presence of an interac- 
tion, as unimportant as it may be, will exclude a relevant 
active substance from recommendation while in princi- 
ple the benefit of administering the substance might be 
more important than a possible side effect. An example 
in the current evaluation dataset is the administration of 
the substance Budesonide to a patient whose current med- 
ication contains Acetylsalicylic acid and has been newly 
diagnosed with "Pneumonia". The treating physician made 
the decision that the benefit from receiving the Budes- 
onide was greater that the risk of possible interactions. 
Experienced physicians are in a position to effectively 
make this kind of judgments. 

Secondly, there is the possibility that patient data might 
have been logged inaccurately, e.g. a patient was suffering 
from Sepsis due to Staphylococcus aureus (ICD-10: A4 1.0) 
while Sepsis (ICD-10: A41) was registered as diagnosis, 
or some information from the diagnosis is missing, e.g. 
a patient was diagnosed with "Pneumonia", but the caus- 
ing bacteria strand (staphylococcus) was omitted. Such 
inaccuracies or missing information affect the automated 
recommendations results as the recommendation rules 
have to match all premises in order to produce the results. 

An additional cause for decreased accuracy has been 
mentioned earlier in the Section, i.e. contraindicated 
drugs could be prescribed because they would cause a 
desirable side effect that will be of benefit to the patient 
as is the case of simultaneous delivery of tinzaparin and 
acetylsalicylic acid in a patient with ischemic stroke. Espe- 
cially for these situations, the experience of the treating 
physician plays an important role for such decisions. 
Panacea is the analogy of an inexperienced physician who 
goes "by the book" in contrast to an experienced physician 
who has the knowledge to make a successful compromise 
between possible risks (from interactions) and benefits 
(from the drug). 

The above evaluation gave useful insights on ways where 
Panacea could be improved in order to be a valuable tool 



in a physicians arsenal. Panacea has proven quite effective 
in identifying drug interactions and contraindications. 
Regarding the potential for actual drug recommendation, 
although Panacea managed to propose various possible 
treating drugs according to medical record data, these 
recommendations in some cases varied from the actual 
drug prescription that the patients received. Several fac- 
tors that influenced the results have been identified. One 
major issue seems to be that in many cases the diagnosis 
is given in a general and unclear statement, e.g. respira- 
tory infections or sepsis, while recommendation rules are 
concretely structured and give answers to specific disease 
diagnosis. For such recommendation systems to be effi- 
cient, diagnosis as general as e.g. "respiratory infection" 
are not adequate. The respiratory system starts from the 
nostrils and ends in the pulmonary alveoli and there are 
numerous ICD-10 codes that describe every individual 
infection in the respiratory system, as there are numer- 
ous bacteria or viruses that causes these infections. These 
combinations of diseases and causes are precisely encoded 
in Panaceas rules and a precise diagnosis would generate 
the exact recommendation. However, a treating physician 
is often not in a position to exactly know the topology 
and the cause of an infection so it is common to pre- 
scribe drugs that cover most of the possible combinations. 
This generalization of something specific, e.g. pneumo- 
nia due to streptococcus, to something more general, e.g. 
respiratory infection, is what affects the recommenda- 
tions results. Issues such as the above should be taken 
under consideration during the future development of 
Panacea. 

Performance evaluation 

For evaluating the framework in terms of performance, 
a comparison was made between the two approaches for 
the final stage reasoning and with GalenOWL (with values 
taken from [9]). The comparisons were focused on the 
usability of the framework in a production environment 
as the rule base has been validated in [9]. Three param- 
eters were measured. These were initialization time, the 
time to get the system up and running, memory consump- 
tion after initialization, and query response time, i.e. the 
time that is needed to have the rule base executed and the 
results retrieved. Results are shown in Table 4. 

There are some points to discuss in the table results. 
Initialization involves loading the ontology in memory, 
performing inference, and preparing the medical rule base 
for patient data reasoning. In the Jena implementation, the 
rule base is processed and loaded only after the patient 
instance has been introduced to the system, while the 
Drools implementation loads the whole rule base on the 
engine before any patient data are introduced. As a result, 
Drools appears slower than the Jena approach regarding 
initialization. For the same reason, memory consumption 
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Table 4 Evaluation between the 2 Panacea reasoning 
approaches and GalenOWL 





Panacea-Jena 


Panacea-Drools 


GalenOWL 


Initialization 
time 


32.0 s 


34.7 s 


148 s 


Memory 
consumption 

of which rule base 
consumes 


169 MB 
0 MB 


280 MB 
111 MB 


649 MB 


Query response 
time 


47 ms 


5 ms 


16 ms 



appears greater for Drools. This metric corresponds to 
memory consumption from initialization to recommen- 
dations retrieval While in Drools the whole rule base is 
loaded on memory, in Jena the approach was to load a 
small subset of the rule base that could possibly match the 
patient data, which leads to a smaller memory footprint. 
Finally, for query response the advantage is with Drools, 
as was expected, mainly due to the fact that Drools is a 
dedicated rule engine while Jenas focus is not at providing 
a state of the art reasoner and rule engine, but a versatile 
API for ontology management. 

Numerically, the Jena approach seems to be more effi- 
cient than Drools, apart from the query execution time 
but for which the difference is not important. How- 
ever, while for the present knowledge base Jena seems 
to perform better, this fact could change as more and 
more rules are added. It is estimated that eventually at 
its final stage, Panacea will incorporate more than 9, 000 
drug-drug and drug-disease interactions. As already said, 
Jena is more focused as an ontology API and less as an 
efficient rule engine which could eventually lead to scal- 
ing problems. On the other hand, scaling with Drools 
is not an issue. The value of business rule engines as 
Semantic Web reasoners has been previously exploited 
using approaches such as [11], where the authors imple- 
mented two OWL2-RL [12] reasoners using the Drools 
and Jess rule engines respectively. The use of traditional 
rule engines with the Semantic Web technologies brings 
together the best of both worlds, i.e. increased efficiency 
coupled with interoperability and semantic annotation of 
information. 

What is also noticeable from Table 4 is the decreased 
memory requirement of Panacea compared to the pre- 
vious OWL-based GalenOWL system, although the two 
approaches offer very similar functionality. As a result of 
this achievement, Panacea can accommodate a far greater 
knowledge base thus supporting the claim of increased 
scalability. 

Panacea will eventually be offered as a service with 
potential customers being health care professionals. 



Other possible exploitation routes are being investigated 
such as integration to patient management systems in 
health clinics. The use of personalized drug prescription 
systems, as Panacea, in everyday practice will have advan- 
tages to the society and the economy. A major benefit 
from the use of such systems is the reduction of medical 
costs through rational drug prescriptions that personal- 
ized drug prescription allows [13]. Another benefit is a 
positive effect in public health with reduction of outbreaks 
relating to drug interactions or adverse effects [14]. All 
knowledge regarding drug information is encoded and 
is available to the experts in order to aid them during 
prescriptions thus acting as decision support systems. It 
should be stressed out that drug recommendation systems 
do not aim to replace medical experts but to support them 
in their practice. 

A limitation of the proposed approach is that a rather 
large amount of manual effort by experts is required in 
order to populate and enrich the rule base. Although the 
semantic technologies that have been employed can make 
rule authoring simpler, no automated method for phar- 
maceutical rule generation has been integrated. However, 
one would argue that since rule authoring is performed 
by experts then the rules are verified and guaranteed to 
be correct. Even if an automated method, such as rule 
mining, had been implemented, the generated rules would 
still have to be verified be an expert in the field. Man- 
ual verification, although less intensive, would still be 
required. 

Conclusions 

The paper presented Panacea, a framework for semantic- 
enabled drug recommendations discovery. The frame- 
work utilizes a layered reasoning approach were the 
medical ontology and the patient data instances are fed 
to an extended RDFS reasoner in order to infer implicit 
knowledge. Drug recommendations are generated using 
the second reasoning layer where any common rule engine 
can be used. As a proof of concept implementation, the 
Jena reasoner and the Drools rule engine has been inte- 
grated. Two different evaluations were conducted. One 
performance evaluation regarding requirements and effi- 
ciency of the proposed approach, and a quality evaluation 
regarding the systems outcome in terms of real clinical 
data. The quality evaluation gave insights regarding pos- 
sible extensions that could make the system more in line 
with current clinical practice. Future work on Panacea 
will focus on providing ways to address the issues uncov- 
ered during the quality evaluation and provide results 
that more closely match a physicians decision. These 
could include improvements such as the weighting of 
interactions and contraindications according to a sever- 
ity observation and probabilistic inference based on these 
weights. To this end, Drools is being extended with a fuzzy 
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reasoning engine [15], which while it's still in develop- 
ment, it's actively supported and it is mature enough to 
be able to use it as a testing framework. Finally, the addi- 
tion of dosage recommendations in the rules is an ongoing 
work. 

Endnotes 

a OWL - Web Ontology Language, http://www.w3.org/ 
TR/owl2- overview/ 

b RuleML - Rule Markup Language, http://www.ruleml. 
org 

C SWRL - Semantic Web Rule Language, http://www. 
w3.org/Submission/SWRL/ 

d ICD-10 - International Classification of Diseases, 
http://www.who.int/classifications/icd/en/ 

e ATC - Anatomical Therapeutic Chemical 
classification, http://www.whocc.no/atc/structure_and_ 
principles/ 

f UNII - Unique Ingredient Identifier, http://www.fda.gov/ 
Forlndustry/DataStandards/SubstanceRegistrationSystem- 
UniquelngredientldentifierUNII/default.htm 

g ICTV - International Virus Taxonomy, http://www. 
ictvonline.org/virusTaxonomy.asp 

h SPARQL - Query language for RDF, http://www.w3. 
org/TR/rdf- sparql- query/ 

^KOS - Simple Knowledge Organization System, 
http://www.w3.org/2009/08/skos-reference/skos.html 

j Apache Jena - Semantic Web framework, http://jena. 
apache.org/ 

k Drools - Business logic integration platform, http:// 

www.jboss.org/drools 
! JenaBean API, http://code.google.eom/p/jenabean/ 
m AHEPA University Hospital of Thessaloniki, http:// 

www.ahepahosp.gr 
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